Skip to content

Conversation

@seanses
Copy link
Collaborator

@seanses seanses commented Dec 18, 2024

In progress adding number of retries to the metrics.

Example tracing:

{"timestamp":"2024-12-18T10:04:56.945534Z","level":"INFO","fields":{"message":"Xorb upload accumulated rate: 74.54 Mbps"},"filename":"/home/ubuntu/data/xet-core/data/src/parallel_xorb_uploader.rs","line_number":90}
{"timestamp":"2024-12-18T10:04:56.945589Z","level":"INFO","fields":{"message":"Xorb upload instantaneous rate: 74.54 Mbps"},"filename":"/home/ubuntu/data/xet-core/data/src/parallel_xorb_uploader.rs","line_number":90}
{"timestamp":"2024-12-18T10:04:59.166637Z","level":"INFO","fields":{"message":"Xorb upload accumulated rate: 284.91 Mbps"},"filename":"/home/ubuntu/data/xet-core/data/src/parallel_xorb_uploader.rs","line_number":90}
{"timestamp":"2024-12-18T10:04:59.166665Z","level":"INFO","fields":{"message":"Xorb upload instantaneous rate: 966.45 Mbps"},"filename":"/home/ubuntu/data/xet-core/data/src/parallel_xorb_uploader.rs","line_number":90}
...
{"timestamp":"2024-12-18T10:05:14.036182Z","level":"INFO","fields":{"message":"Xorb upload instantaneous rate: 2.62 Gbps"},"filename":"/home/ubuntu/data/xet-core/data/src/parallel_xorb_uploader.rs","line_number":90}
{"timestamp":"2024-12-18T10:05:16.244455Z","level":"INFO","fields":{"message":"Xorb upload accumulated rate: 1.32 Gbps"},"filename":"/home/ubuntu/data/xet-core/data/src/parallel_xorb_uploader.rs","line_number":90}
{"timestamp":"2024-12-18T10:05:16.244487Z","level":"INFO","fields":{"message":"Xorb upload instantaneous rate: 2.19 Gbps"},"filename":"/home/ubuntu/data/xet-core/data/src/parallel_xorb_uploader.rs","line_number":90}

@seanses seanses marked this pull request as draft December 18, 2024 10:11
@seanses seanses requested a review from port8080 December 18, 2024 10:17
@seanses seanses changed the title Track and report egress rate Track and report network stat Dec 18, 2024
type XorbUploadValueType = (MerkleHash, Vec<u8>, Vec<(MerkleHash, usize)>);
struct NetworkStatCheckPoint {
n_bytes: u64,
start: Instant,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found using std::time::SystemTime more useful in sending these over compared to Instant in #121

use crate::errors::DataProcessingError::*;
use crate::errors::*;

const DEFAULT_NETWORK_STAT_REPORT_INTERVAL_SEC: u32 = 2; // 2 s
Copy link

@port8080 port8080 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably have this set low for testing, would prefer larger in prod - maybe 100s

while let Some(result) = upload_tasks.join_next().await {
result??;
let metrics = result??;
let mut egress_rate = self.egress_stat.lock().await;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try the lock and optimistically report. I think it is completely OK to skip the update_and_report call if there is lock contention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants